Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources

نویسندگان

  • Doina Caragea
  • Jyotishman Pathak
  • Jie Bao
  • Adrian Silvescu
  • Carson M. Andorf
  • Drena Dobbs
  • Vasant Honavar
چکیده

We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources

Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. This has resulted in unprecedented opportunities in data-driven knowledge acquisition and decisionmaking in a number of emerging increasingly data-rich application ...

متن کامل

Ontology Design Patterns for Large-Scale Data Interchange and Discovery

Data and information integration remains a major challenge for our modern information-driven society whereby people and organizations often have to deal with large data volumes coming from semantically heterogeneous sources featuring significant variety between them. In this context, data integration aims to provide a unified view over data residing at different sources through a global schema,...

متن کامل

Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources (KADASH)

ion. For example, the program of study a student in a data source can be specified as Graduate Program (higher level of abstraction), while the program of study of a different student in the same data source (or even a different data source) can be specified as Doctoral Program (lower level of abstraction). 2005 IEEE ICDM Workshop on KADASH 5 The workshop brings together researchers in relevant...

متن کامل

Knowledge Acquisition from Semantically Heterogeneous Data

Recent advances in sensors, digital storage, computing and communications technologies have led to a proliferation of autonomously operated, geographically distributed data repositories in virtually every area of human endeavor, including e-business and e-commerce, e-science, e-government, security informatics, etc. Effective use of such data in practice (e.g., building useful predictive models...

متن کامل

A Methodology for Terminology-based Knowledge Acquisition and Integration

In this paper we propose an integrated knowledge management system in which terminology-based knowledge acquisition, knowledge integration, and XML-based knowledge retrieval are combined using tag information and ontology management tools. The main objective of the system is to facilitate knowledge acquisition through query answering against XML-based documents in the domain of molecular biolog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005